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9729-1-1/CCCCC2 
SECRETORY EXPRESSION IN EUKARYOTES 

BACKGROUND OF THE INVENTION 
Field f th Invention 
5 Hybrid DNA technology has revolutionized the 

ability to produce polypeptides of an infinite variety, 
of compositions. Since living forms are composed of 
proteins and employ proteins for regulation, the 
ability to duplicate these proteins at will offers 
10 unique opportunities for investigating the manner in 
which these proteins function and the use of such 
proteins, fragments of such proteins, or analogs in 
therapy and diagnosis. 

There have^been numerous advances in improv- 
15 ing the rate and amount of protein produced by a cell. 
Most of these advances have been associated with higher 
copy numbers, more efficient promoters, and means for 
reducing the amount of degradation of the . desired 
product. Is is evident that it would be extremely 
20 desirable to be able to secrete polypeptides of interest, 
where such polypeptides are the product of interest. 

Furthermore, in many situations, the polypep- 
tide of interest does not have an initial methionine 
amino acid. This is usually a result of there being a 
25 processing signal in the gene encoding for the polypep- 
tide of interest, which the gene source recognizes and 
cleaves with an appropriate peptidase. Since in most 
situations, genes of interest are heterologous to the 
host in which the gene is to be expressed, such proces- 
30 sing occurs imprecisely and in low yield in the expres- 
sion host. In this case, while the protein which is 
obtained will be identical to the peptide of inter st 
for almost all o f its sequence/ it will differ at th 



N-.terminus which can deleteriously affect physiological 
35 activity. 
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There ar , therefor , many r asons why it 
would b extremely advantageous to prep ax DNA se- 
quences , which would encod for the secretion and 
maturing of the polypeptide product. Furthermore, 
5 where sequences can be found for processing, which 

result in the removal of amino acids superfluous to the 
polypeptide of interest, the opportunity exists for 
having a plurality of DNA sequences, either the same or 
different, in tandem, which may be encoded on a single 
10 transcript. 

Description of the Prior Art 

U.S. Patent No. 4,336,336 describes for pro- 
karyotes the use of a leader sequence coding for a non- 
cytoplasmic protein normally transported to or beyond 
15 the cell surface, resulting in transfer of the fused 
protein to the periplasmic space. U.S. Patent No. 
4,338,397 describes for prokaryotes using a leader 
sequence which provides for secretion with cleavage of 
the leader sequence from the polypeptide sequence of 
20 interest. U.S. Patent No. 4,338,397, columns 3 and 4, 
provide for useful definitions, which definitions are 
incorporated herein by reference. 

Kurjan and Herskowitz, Cell (1982) 30:933-943 
describes a putative a -factor precursor containing four 
25 tandem copies of mature o-factor, describing the 
sequence and postulating a processing mechanism. 
Kurjan and Herskowitz, Abstracts of Papers presented at 
the 1981 Cold Spring Harbor meeting on The Molecular 
Biology of Yeasts, page 242, in an Abstract entitled, 
30 W A Putative a -Factor Precursor Containing Four Tandem 
Repeats of Mature a -Factor, u describe the sequence 

encoding for the a-factor and spacers between two of 

such sequences. Blair et al.. Abstracts of Papers, 
ibid , page 243, in an Abstract entitled "Synthesis and 
35 Processing of Y ast Pheremones: Identification and 
Characterization of Mutants That Produc Altered a- 
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Factors, 11 describ th ffect of various mutants on 
th product! n of matur a -factor. 

SUMMARY OF THE INVENTION 

Methods and compositions are provided for 
producing mature polypeptides. DNA constructs ar 
provided which join the DNA fragments encoding f r a 
yeast leader sequence and processing signal to h terolo- 
gous genes for secretion and maturation of the poly- 
peptide product. . The construct of the DNA encoding for 
the N- terminal cleavable oligopeptide and the DNA 
sequence encoding for the mature polypeptide product 
can be joined to appropriate vectors for introduction 
into yeast or other cell which recognizes the processing 
-signals for production of the desired polypeptide. 
Other capabilities may also be introduced into the 
construct for various purposes. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 is a flow diagram indicating the 
construction of pYoEGF-21. 

Fig. 2 shows sequences at fusions of hEGF to 
the vector, a. through e. show the sequences at the 
N- terminal region of hEGF, which differ among several 
constructions and f . shows the C- terminal region of 
hEGF. 

DESCRIPTION OF THE SPECIFIC EMBODIMENTS 

In accordance with the subject inventi n, 
eukaryotic hosts, particularly yeast are employed for 
the production of mature polypeptides,' where such 
polypeptides may be harvested from a nutrient medium. 
The polypeptides are produced by employing a DNA 

cons1^ct."^nwding for yeast _ reader and processing 

signals joined to a polypeptide of interest, which may 
b a single polypeptide r a plurality of polypeptid s 
separated by pr cessing signals. The resulting 
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construct encodes for a pr -pro-polypeptide which will * 
c ntain the signals £ r.secr tion fth pr -pro- 
polypeptide and pr c ssing of th p lypeptid , either 
intracellularly or extracellularly to the nature 
5 polypeptide* 

The- constructs of the subject invention will for example 
have at least the following formula defining a pro- 
polypeptide: 

( (R) r -(GAXYCX) n -Gene*) y 

10 wherein: 

R is CGX or AZZ, the codons coding for lysine 
and arginine, each of the Rs being the same or different; 

r is an integer of from 2 to 4, usually 2 to 
3, preferably 2 or 4; 
15 X is any of the four nucleotides, T, G, CY or 

A; 

Y is G or C; 

y is an integer of at least one and usually 
not more than 10, more usually not more than four, 
20 providing for monomers and multimers; 
Z is A or G; and 

Gene* is a gene other than a- factor, usually 

foreign to a yeast host, usually a heterologous gene, 

desirably a plant or mammalian gene; 
25 n is 0 or an integer which will generally 

vary from 1 to 4, usually 2 to 3. 

The pro-polypeptide has an N-terminal proces- 

sing signal for peptidase removal of the amino acids 

preceding the amino acids encoded for by Gene*. 
30 For the most part, the constructs of the 

subject invention will have at least the following 

formula: 

L-(R-S-(GAXYCX) n )-Gene*) y 
d fining a pr -pro-polypeptide, wherein all 
35 the symb Is except L and S have b en defined, S having 
th same definition as R, th re being 1R and IS, and L 
is a leader s queue providing for seer tion of the 
pr -pro-polypeptide. While it is feasible to have mor 
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Rs and Ss there will usually b no advantage in th 
additional amino acids. Any 1 ader seguenc may b 
employ d which pr vides for s cr tion, leader sequences 
generally being f about 30 to 120 amino acids, usually 
5 about 30 to 100 amino acids, having a hydrophobic 
region and having a methionine at its N- terminus . 

The construct when n is 0 will have the 
following formula: 

L-((R) r ,-Gene*) y 

10 defining a pre-pro-polypeptide , wherein all the symbols 
have been defined previously, except r* , wherein: " 
r* is 2 to 4, preferably 2 or 4. 
Of particular interest is the leader seguenc 
of ct-f actor which is described in Kurjan and Hersko- 

15- witz, supra, on page 937 or fragments or analogs 

thereof, which provide for efficient secretion of the 
desired polypeptides. Furthermore, the DNA sequence 
indicated in the article, which sequence is incorporat d 
herein by reference, is not essential, any sequence 

20 which encodes for the desired oligopeptide being 

sufficient. Different sequences will be more or less 
efficiently translated. 

While the above formulas are preferred, it 
should be understood, that with suppressor mutants, 

25 other sequences could be provided which would result in 
the desired function. Normally, suppressor mutants are 

not as efficient for expression and, therefore, the 

above indicated sequence or equivalent sequence enc ding 
for the same amino acid sequence is preferred. To the . 

30 extent that a mutant will express from a different 
codon the same amino acids which are expressed by th 
above sequence, then such alternative sequence could b 
permitted. 

: The dipeptJ.deS which are encoded for— by the 

35 - sequence in the parenthesis will be an acidic amino 
acid, aspartic or glutamic, preferably glutamic, - 
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£ How d by a neutral amino acid, alanin and proline, 
particularly alanine. 

In providing for us ful DNA sequences which 
c an be used for cassettes for expression, the following 
5 sequence can be conveniently employed: 

Tr-L- ( (R-S ) r „- (GAXYCX) n , -W- (Gene* ) d ) y 

wherein: 

Tr intends a DNA sequence encoding for the 
transcriptional regulatory signals, particularly the 

10 promoter and such other regulatory signals as operators, 
activators, cap signal, signals enhancing ribosomal 
binding, or other sequence involved with transcriptional 
or translational control. The Tr sequence will generally 
be at least about lOObp and not more than about 2000bp. 

15 Particularly useful is employing the Tr sequence 

associated with the leader sequence L, so that a DNA 
fragment can be employed which includes the transcrip- 
tional and translational signal sequences associated 
with the leader sequence endogenous to the host. 

20 Alternatively, one may employ other transcriptional and 
translational signals to provide for enhanced production 
of the expression product; 

d is 0 or 1, being 1 when y is greater than 

If 

25 n ! is a whole number, generally ranging from 

0 to 3, more usually being 0 or 2 to 3; 
r u is 1 or 2; 

W intends a terminal deoxyribosyl-3 * group, 
or a DNA sequence which by itself or, when n» is other 
30 than 0, in combination with the nucleotides to which it 
is joined, W defines a restriction site, having either 
a cohesive end or butt end, wherein W may have from 0 
to about 20 nucleotides in - the longest chain; 

the remaining symbols having been defined 

35 previously. 

Of particular int r st is the following 

construct: 
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(Tr) a -L-(R-S) r „-(GAXYCX) nU GA!AGCT! 

wh rein: 

all f th symbols previously defined have 
the same d finition; 

5 a is 0 or 1 intending that the construct may 

or may not have the transcriptional and trans lational 
signals ; 

the nucleotides indicated in the broken b x 
are intended not to be present but to be capable of 
10 addition by adding an Hind i 1 1 cleaved terminus to 

provide for the recreation of the sequence encoding for 
a dipeptide; and 

n M will be 0 to 2, where at least one of the 
Xs and Ys defines a nucleotide, so that the sequence in 
15 the. parenthesis is other than the sequence GAAGCT. 

The coding sequence of Gene* may be joined to 
the terminal T, providing that the coding sequence is 
in frame with the initiation codon and upon, processing 
the first amino acid will be the correct amino acid for 
20 the mature polypeptide. 

The 3 * -terminus of Gene* can be : manipulated 
much more easily and, therefore, it is desirable to 
provide a construct which allows for insertion of Gene* 
into a unique restriction site in the construct. Such 
. 25 a construct would provide for a restriction site with 
insertion of the Gene* into the restriction site to b 
in frame with the initiation codon. Such a construction 
can be symbolized as follows: 

(Tr ) a -L- (R-S ) rI ,-(GAXYCX) n „-W- ( SC ) b «Te 

30 wherein: 

those symbols previously defined have the 
same definition? 

SC are stop codons; 

Te is a^tennination sequenc e balance d with 

35 the promoter Tr, and may include other signals, e.g. 
polyadenylation; and 
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b is an integer which will generally vary 
from about 0 to 4, more usually from 0 to 3, it being 
understood, that Gene* may include its own stop codons. 

Illustrative of a sequence having the above 
5 formula is where W is the sequence GA and n ,( is 2. 

Of particular interest is where the sequence 

: encoding the terminal dipeptide is taken together with 

W to define a linker or connector, which allows for 
recreation of the terminal sequence defining the 
10 dipeptide of the processing signal and encodes for the 
initial amino acids of Gene*, so that the codons are in 
frame with the initiation codon of the leader. The 
linker provides for a staggered or butt ended termina- 
tion, desirably defining a restriction site in conjunc- 
15 tion with the successive sequences of the Gene*. Upon 
ligation of the linker with Gene*, the codons of Gene* 
will be in frame with the initiation codon of the 
leader. In this manner, one can employ a synthetic 
sequence which may be joined to a restriction site in 
20 the processing signal sequence to recreate the proces- 
sing signal, while providing the initial bases of the 
Gene* encoding for the K- terminal amino acids. By 
employing a synthetic sequence, the synthetic linker 
can be a tailored connector having a convenient restric- 
25 tion site near the 3 '-terminus and the synthetic 

connector will then provide for the necessary codons 
for the 5 1 -terminus of the gene. 

Alternatively, one could introduce a restric- 
tion endonuclease recognition site downstream from the 
30 processing signal to allow for cleavage and removal of 
superfluous bases to provide for ligation of the Gene* 
to the processing signal in frame with the initiation 

codon. fhWl^h^zirs"t~TOdon _ would _ encode-for the 

N- terminal amino acid of the polypeptide. Where T is 
35 the first base of Gene*, one c uld introduce a r stric- 
tion sit wher the recognition s quence is downstream 
from the cleavag site. For example, a Sau3A recogni- 
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tion s quence could be introduc d immediat ly after the 
processing signal, which would allow f r cleavage and 
linking of the Gen * with its initial codon in frame 
with the leader initiati n codon. With restriction 
5 endonucleases which have the recognition sequence 

distal and downstream from the cleavage site e.g. Bga l, 
W could define such sequence which could include a 
portion of the processing signal sequences. Other 
constructions can also be employed, employing such 
10 techniques as primer repair and in vitro mutagenesis to 
provide for the convenient insertion of Gene* into the 
construct by introducing an appropriate restriction 
site. 

The construct provides a portable seguenc 

15 for insertion into vectors, which provide the desir d 
replication system. As already indicated, in some 
instances, it may be desirable to replace the wild type 
promoter associated with the leader sequence with a 
different promoter. In yeast, promoters involved with 

20 enzymes in the glycolytic pathway can provide for high 
rates of transcription. These promoters are associated 
with such enzymes as phosphoglucoisomerase, phos- 
. phofructokinase, phosphotriose isomer ase, phospho- 
glucomutase, enolase, pyruvic kinase, glycer aldehyde- 3- 

25 phosphate dehydrogenase, and alcohol dehydrogenase. 
These promoters may be inserted upstream from the 
leader sequence. The 5 9 -flanking region to the leader 
sequence may be retained or replaced with the 3*- 
sequence of the alternative promoter, vectors can be 

30 prepared and have been reported which include promoters 
having convenient restriction sites downstream from the 
promoter for insertion of such constructs as described 
above. 

The final construct wilT~be an episomal 
35 element capable of stable maintenance in a host, 

particularly a fungal host such as yeast. The construct 
will include one or more replication systems, d sirably 
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two replication systems, allowing for maintenance in 
the expr ssion host and cloning in a prokary te. In 
addition, one or mor markers for sel ction will b 
included, which will allow for selective pressure for 
maintenance of the episomal element in the host. 
Furthermore, the episomal element may be a high or low 
copy number, the copy number generally ranging from 
about 1 to 200. With high copy number episomal elements, 
there will generally be at least 10, preferably at 
least 20, and usually not exceeding about 150, more 
usually not exceeding about 100 copy number. Depending 
upon the Gene*, either high or low copy numbers may be 
desirable, depending upon the effect of the episomal 
element on the host. Where the presence of the expres- 
sion product of the episomal element may have a dele- 
terious effect on the viability of the host, a low copy 
number may be indicated. 

Various hosts may be employed, particularly 
mutants having desired properties. It should be 
appreciated that depending upon the rate of production 
of the expression product of the construct, the pro- 
cessing enzyme may or may not be adeguate for process- 
ing at that level of production. !Eherefore, a mutant 
having enhanced production of the processing enzyme may 
be indicated or enhanced production of the enzyme may 
be provided by means of an episomal element. Generally, 
the production of the enzyme should be of a lower order 
than the production of the desired expression product. 

Where one is using a- factor for secretion and 
processing, it would be appropriate to provide for 
enhanced production of the processing enzyme Dipeptidyl 
Amino Peptidase A, which appears to be the expression 
product of STE13. This enzyme appears to - be — specific 
for X-Ala- and X-Pro-sequences, where X in this instance 
intends an amino acid, particularly, the dicarboxylic 
acid amin acids. 
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Alternatively, th r may be situations where 
intracellular processing is not desir d. In this 
situation, it would b us ful to have a ste!3 mutant, 
wher s cretion occurs, but the product is not pro- 
5 cessed. In this manner, the product may be subse- 
guentally processed in vitro . 

Host mutants which provide for controlled 
regulation of expression may be employed to advantage. * 
For example, with the constructions of the subject 
10 invention where a fused protein is expressed, the 

trans form ants have slow growth which appears to be a 
result of toxicity of the fused protein. Thus, by 
inhibiting expression during growth, the host may b 
grown to high density before changing the conditions to 
15 permissive conditions for expression. 

A temperature-sensitive sir mutant may be 
employed to achieve regulated expression. Mutation in 
any of the SIR genes results in a non-mating phenotyp 
due to in situ expression of the normally silent MATa 
20 and MATa sequences present at the HML and HMR loci. 

Furthermore, as already indicated, the Gene*, 
may have a plurality of seguences in tandem, either the 
same or different seguences, with intervening processing 
signals, in this manner, the product may be processed 
25 in whole or in part, with the result that one will 

obtain the various sequences either by themselves or in 

tandem for subsequent processing. In many situations, 

it may be desirable to provide for different sequences, 
where each of the sequences is a subunit of a particular 
30 protein product. 

The Gene* may encode for any type of polypep- 
tide of interest. The polypeptide may be as small as 
an oligopeptide of 8 amino acids or may be 100,000 

daltons - or _ higher^ — Usuali^j— single chains-wtl-l— be 1 ss^ 

35 than about 300,000 daltons, more usually less than 
about 150,000 daltons. Of particular interest ar 
polypeptides of from about 5,000 to 150,000 daltons. 
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mor particularly of about 5 r 000 to 100, 000 daltons. 
Illustrative polypeptide s of inter st includ hormones 
and factors, such as growth hormone, somatomedins 
epidermal growth factor, the endocrine secretions, such 
5 as luteinizing hormone, thyroid stimulating hormone, 
oxytocin, insulin, vasopressin, renin, calcitonin, 
follicle stimulating hormone, prolactin, etc.; hemato- 
poietic factors, e.g. erythropoietin, colony stimulating 
factor, etc.? lymphokines; globins; globulins, e.g. 
10 immunoglobulins; albumins; interferons, such as a, p 

and y; repressors; enzymes; endorphins e.g. p -endorphin, 
enkephalin, dynorphin, etc. 

Having prepared the episomal elements con- 
taining the constructs of this invention, one may then 
15 introduce such element into an appropriate host. The 
manner of introduction is conventional, there being a 
wide variety of ways to introduce DNA into a host. 
Conveniently, spheroplasts are prepared employing the 
procedure of, for example, Hinnen et al. , PNAS USA 
20 (1978) 75:1919-1933 or Stinchcomb et al., EP 0 045 573 
A2. The trans formants may then be grown in an appro- 
priate nutrient medium and where appropriate, maintaining 
selective pressure on the trans formants. Where expres- 
sion is inducible, one can allow for growth of the 
25 yeast to high density and then induce expression. In 
those situations, where a substantial proportion of the 
product may be retained in the periplasmic space, one 
can release the product by treating the yeast cells 
with an enzyme such as zymolase or lyticase. 
30 The product may be harvested by any conve- 

nient means, purifying the protein by chromatography, 
electrophoresis, dialysis, solvent-solvent extraction, 
etc. 

In accordance with the subject invention, one 
35 can pr vide for secretion f a wide variety of polypep- 
tides, so as to gr atly enhance product yield, simplify 
purification, minimize degradation f the desired 
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product, and simplify processing, equipment, and 
. ngineering requirements. Furthermore, utilization of 
nutrients based on productivity can be greatly nhanced, 
so that more econ mical and more efficient product! n 
of polypeptides may be achieved. Also, the use of i 
yeast has many advantages in avoiding enterotoxins , 
which may be present with prokaryotes, and employing 
known- techniques , which have been developed for yeast 
over long periods of time, which techniques includ 
isolation of yeast products. 

The following examples are offered by way of 
illustration and not by way of limitation. 

EXPERIMENTAL 
A synthetic sequence for human epidermal 
growth factor (EGF) based on the amino acid sequence of 
EGF reported by H. Gregory and B.M. Preston Int< J. 
Peptide Protein Res. 9, 107-118 (1977) was prepared, 
which had the following sequence. 

■ ' . i ... . • . 

5' MCTCCGACTCCGAATCTCCATTGTCCCACGACGGTTACTGTTTGCACGACGGTGTTTGT , 
3* TTGAGGCTGAGGCTTACAGGTAACAGGGTGCTGCCMTGACAMCCT 

ATGTACIATCGMGCTTTGGACAAGTACGCTTGTAACTGTGTTGTTGGTTACATCGGTGAA 
TACATGTAGCTICGAMCCTGTTCATGCGAACATTGACACAACAACCAATGTAGCCACTT 

AGATGTCAATACAGAGACTTGAAGTGGTGGGAATTGAGATGA , 
TCTACAGTTATGTCTCTGAACTTCACCACCCTTAACTCTACT , 

where 5 1 indicates the promoter proximal end of the 
sequence. The sequence was inserted into the EcoR I 
site of pBR328 to produce a plasmid p328EGF-l and . 
cloned. 

Approximately 30ug of p328EGF-l was digested 
with EcoR I and approximately lyg of the expected 190 
base pair EcoR I fragment was isolated. This was 
fol4owed-by-digestion-witk-the^restriction-enzyme-HgaL._ 
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Two synthetic oligonucleotide c nnectors Hindi 1 1 -Hga l 
and Hqal-Sall were then ligated to the 159 base pair 
Hgal fragment. The Hgal-Hindlll linker had the following 

sequence: 
5 AGCTGAAGCT 

CTTCGATTGAG 

This linker restores the a- factor processing signals 
interrupted by the Hindi 1 1 digestion and joins the Hgal 
end at the 5«-end of the EGF gene to the Hindi 1 1 end 
10 of pAB112. 

The Hgal -Sal I linker had the following 

sequence : 

TGAGATGATAAG 

ACTATTCAGCT 

15 This linker has two stop codons and joins the Hgal end 
at the 3' -end of the EGF gene to the Sail end of 
pAB112. 

The resulting 181 base pair fragment was 
purified by preparative gel electrophoresis and ligated 
20 to lOOng of pAB112 which had been previously completely 
digested with the enzymes Hindlll and Sail. Surprisingly, 
a deletion occurred where the codon for the 3rd and 4th 
amino acids of EGF, asp and ser, were deleted, with the 
remainder of the EGF being retained. 
25 pAB112 is a plasmid contai n i n g a 1.7530) EcoR I 

fragment with the yeast a-factor gene cloned in the 
EcoRI site of pBR322 in which the Hindi 1 1 and Sail 
sites had been deleted (pABll). pAB112 was derived from 
plasmid pABlOl which contains the yeast a-factor gene 
30 as a partial Sau3A fragment cloned in the BamHI site of 
plasmid YEp24. pABlOl was obtained by screening a 
yeast genomic library in YEp24 using a synthetic 20-mer 
~ oligonucleotide probe (3 1 -GGCCGCTTOGTTACfiTGATT=5 ' ) 
homologous to the published a-factor coding region 
35 (Kurjan and Herskowitz, Abstracts 1981 Cold Spring 
Harbor me ting on the Molecular Biology of Yeasts, 
page 242). 
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Tb resulting mixture was us d to transform 
E. coli HB101 c lis and plasmid pAB201 obtained. / 
Flasmid pAB201 (5pg) was digest d to c mpletion with 
the enzyme EcoR I and the resulting fragments were: 
5 a) filled in with DNA polymeras I Klenow fragment; 

b) ligated to an excess of BamH I linkers; and 

c) digested with BamH I. The 1.75kbp EcoR I fragment was 
isolated by preparative gel electrophoresis and 
approximately lOOng of the fragment was ligated to 

10 lOOng of pCl/1, which had been previously digested to 
completion with the restriction enzyme BamSl and 
treated with alkaline phosphatase. 

Plasmid pCl/1 is a derivative of pJDB219, 
Beggs, Nature (1978 ) 275 :104, in which the region 
15 corresponding to bacterial plasmid pMB9 in pJDB219 has 
been replaced by pBR322 in pCl/1. This mixture Was 
used to transform E. coli HB101 cells. Trans f ormants 
were selected by ampicillin resistance and their 
plasmids analyzed by restriction endonucleases . DNA 
20 from one selected clone (pYEGF-8) was prepared and used 
to transform yeast AB103 cells. Trans f ormants were 
selected by their leu' 1 ' phenotype. 

Fifty milliliter cultures of yeast strain 
AB103 (a, pep 4-3, leu 2-3, leu 2- 112 , ura 3-52, his 
25 4- 580 ) transformed with plasmid pYEGF-8 (deposited at 
the American Type Culture Collection on 5th January 
1983 and given ATCC Accession no. 20658) were grown at 
30° in -leu medium to saturation (optical density at 
600nm of 5) and left shaking at 30" for an additional 
30 12 hr period. Cell supernatants were collected by 

centrifugation and analyzed for the presence of human 
EGF using the fibroblast receptor competition binding 
assay. The assay of EGF is based on the ability f 
both mouse and human - EGF~to compete with - ^ 25 Irlabel — d 
35- mouse EGF for binding sites on human foreskin fibr - 
blasts . Standard curves can be obtained by measuring 
the effects of increasing quantities f EGF on the bind- 

125 

ing of a standard amount of I -labeled mouse EGF. 
Under thes conditi ns 2 to 20 ng of EGF are readily 
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measurable. D tails on the binding of I-labeled 
epidermal growth factor to human fibroblasts have been 
describ d by Carpenter et al., J. Biol , Chem. 250 , 4297 
(1975). Using this assay it is found that the cultur 

5 medium contains 7±lmg of human EGF per liter. 

For further character ization, human EGF 
present in the supernatant was purified by absorption 
to the ion-exchange resin Biorex-70 and elution with 
HCl lOmM in 80% ethanol. After evaporation of the HC1 

10 and ethanol the EGF was solubilized in water. This 
material migrates as a single major protein of MW 
approx. 6,000 in 17.5% SDS gels, roughly the same as 
authentic mouse EGF (MW~6,000) . This indicates that 
the a -factor leader sequence has been properly excised 

15 during the secretion process. Analysis by high resolu- 
tion liquid chromatography (microbondapak C18, Waters 
column) indicates that the product migrates with a 
retention time similar to an authentic mouse EGF 
standard. However, protein sequencing by Edman degrada- 

20 tion showed that the N-terminus retai n ed the glu-ala 
sequence. 

A number of other constructions were prepared 
using different constructions for joining hEGF to the 
a -factor secretory leader sequence, providing for 

25 different processing signals and site mutagenesis. In 
Fig. 2 a. through e. show the sequence of the fusions at 
the N- terminal region of hEGF, which sequence differ 
among several constructions, f. shows the sequences at 
the C-terminal region of hEGF, which is the same for all 

30 constructions. Synthetic oligonucleotide linkers used 
in these constructions are boxed. 

These fusions were made as follows. Construc- 

tion (a) was made as described above. Construction ( b) 

was made in a similar way except that linker 2 was used 

35 instead of linker 1. Linker 2 modifies the a- factor 

processing signal by inserting an additional processing 
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site (ser-leu-asp-lys-arg) immediately preceding the 
hEGF gen . Th r suiting y ast plasmid is named 
pYoEGF-22 . Construction (c), in which the dipeptidyl 
aminopeptidase maturation site (glu-ala) has b en removed, 
5 was obtained by in vitro mutagenesis of construction 
(a). A Pst l- Sal l fragment containing the o- factor 
leader-hBGF fusion was cloned in phage 1113 and isolated 
in a single-stranded form. A synthetic 31-mer of 
sequence 5 1 -TCTTTGGATAAAAGAAACTCCGACTCCCG-3 ' was 

10 synthesized and 70 picomoles were used as a primer for 
the synthesis of the second strand from 1 picomole of 
the above template by the Klenow fragment of DNA 
polymerase. After fill-in and ligation at 14° for 18 
hrs, the mixture was treated with nuclease (5 units 

15 for 15 min) and used to transfect E. coli JM101 cells. 
Bacteriophage containing DNA sequences in which the 
region coding for (glu-ala) was removed were located by 
filter plaque hybridization using the 32 P-labeled 
primer as probe. RF DNA from positive plaques was 

20 isolated, digested with Pstl and Sai l and the resulting 
fragment inserted in pAB114 which had been previously 
digested to completion with Sai l and partially with 
Pst l and treated with alkaline phosphatase. 

The plasmid pAB114 was derived as follows: 

25 plasmid pAB112 was digested to completion with Hind i 1 1 
and then religated at low (4ug/ml) DNA concentration 
- and plasmid pAB113 was obtained in which three 63bp 
Hind i 1 1 fragments have been deleted from the a- factor 
structural gene, leaving only a single copy of mature 

30 a- factor coding region. A BafflH I site was added to 

plasmid pABll by cleavage with EcoR i, filling in of the 
overhanging ends by the Klenow fragment of DNA 

polymerase, ligation of BamHI linkers, cleav age with 

BamH l and religation to obtain pAB12. Plasmid pAB113 

35 was digested with EcoR I , the overhanging ends filled 

in, and ligated to BamH I linkers. After digestion with 
BamHI the 1500bp fragm nt was g 1-purif ied and ligated 
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to pAB12 which had been digested with BaroH I and treated 
with alkaline ph sphatase. Plasmid pAB114, which 
contains a 1500bp BamHI fragment carrying the a-factor 
gene, was obtained. The resulting plasmid (pAB114 
5 containing the above described construct) is then 
digested with BamH I and ligated into plasmid pCl/1. 

1 The resulting yeast plasmid is named pYrtEGF-23 

and was deposited at the American Type Culture Collection 
on 12th August 1983 under ATCC Accession no. 40079. 
10 Construction (d) , in which a new Kpnl site 

was generated, was made as described for construction 
(c) except that the 36-mer oligonucleotide primer of 
seguence 5 1 -GGGIACCTTTGGATAAAAGAAACTCCGACTCCGAAT-3 ' was 
used. The resulting yeast plasmid is named pYaEGF-24. 

15 Construction (e) was derived by digestion of the 

plasmid containing construction (d) with Kpn l and" Sail 
instead of linker 1 and 2- The resulting yeast plasmid 
is named pYctEGF-25. 

Yeast cells transformed with pYorEGF-22 were 

20 grown in 15 ml cultures. At the indicated densities r 
times, cultures were centrifuged and the supernatants 
saved and kept on ice. The cell pellets were washed in 
lysis buffer (0.1 Triton X-100, lOmM NaHP0 4 pH 7.5) and 
broken by vortexing (5min in lmin intervals with 

25 cooling on ice in between) in one volume of lysis 

buffer and one volume of glass beads . After centrifuga- 
tion, the supernatants were collected and kept on ice. 
The amount of hEGF in the culture medium and cell 
extracts was measured using the fibroblast receptor 

30 binding competition assay. Standard curves were 

obtained by measuring the effects of increasing quan- 
tities of mouse EGF on the binding of a standard amount 

^^I^l-abeledmouse-EGF; 

Proteins were concentrated from the culture 

35 media by absorption n Bio-Rex 70 resin and eluti n 
with 0.01 HCl in 80% ethanol and purified by high 
performance liquid chromatography (HPLC) on a reverse 
phase C18 column. The column was eluted at a flow rate 
of 4ml/min with a linear gradient of 5% to 80% ac to- 
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nitrile containing 0.2% triflu ro ace tic acid in 60min. 
Proteins (200-800 pic m les) were sequenced at the 
amino-terminal nd by th Edman degradation method 
using a gas-phase protein sequencer Applied Biosystems 
5 model 470A. The normal PROTFA program was used for all 
the analyses. Dithiothreitol was added to S2 (ethyl 
acetate": 20mg/liter) and S3 (butyl chloride: lOmg/liter) 
immediately before use. All samples were treated with 
IN HC1 in methanol at 40° for 15min to convert PTH- 
10 aspartic acid and PTH-glutamic acid to their methyl 
esters. All PTH- amino acid identifications were 
performed by reference to retention times on a IBM CN 
HPLC column using a known mixture of PTH- amino acids as 
standards . 

15 Secretion from pYorEGF-22 gave a 4:1 mole 

ratio of native K-terminus hEGF to glu-ala terminated 
hEGF, while secretion from pYaEGF-23-25 gave only 
native N- terminated hEGF. Yields of hEGF ranged from 5 
to 8pg/ml measured either as protein or in a receptor 

20 binding assay. 

The strain JRY188 (MAT sir3-8 leu2- 3 leu2 - 112 
trpl ura3 his4 rme ) was transformed with pYoEGF-21 and 
leucine prototrophs selected at 37°. Saturated 
. cultures were then diluted 1/100 in fresh medium and 

25 grown in leucine selective medium at permissive (24°) 
and non-permissive (36°) temperatures and culture 
supernatants were assayed for the presence of hEGF as 
described above. The results are shown in the 
following table. 
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Regulated synthesis and secretion of hEGF in trans f rmed 
yeast sir3 temperature- sensitive mutants. 



Temperature Trans formant O.D.650 hEGF(pg/ml) 



Jo Ja 


3 . 3 


A A1 A 
V . Ulu 








3b 


3.6 


0.020 




6.4 


0.024 


24° 3a 


0.4 


34 




1.3 


145 




2.1 


1075 




4.0 


3250 


3b 


0.4 


32 




1.4 


210 




2.2 


1935 




4.2 


4600 



These results indicate that the hybrid 
a-factor/EGF gene is being expressed under mating typ 
regulation, even though it is present on a high copy 
number plasmid. 

In accordance with the subject invention, 
novel constructs are provided which may be inserted 
into vectors to provide for expression of polypeptides 
having an N-terminal leader sequence and one or more 
processing signals to provide for secretion of the 
polypeptide as well as processing to result in a matur 
polypeptide product free of superfluous amino acids. 
Thus, one can obtain a polypeptide having the identical 
sequence to a naturally occurring polypeptide. In 
addition, because the polypeptide can be produced in 
yeast,— glyc»sylation-can-occair,_so_that_projducts_c^_b_ 
obtained which are identical to the naturally occurring 
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products. Furtherm r , because th product is secreted, 
greatly nhanced yields can be obtained based on cell 
population and processing and purification ar greatly 
simplifi d. In addition, employing mutant hosts, 
5 expression can be regulated to be turned off or on, as 
desired. 

- Although the foregoing invention has been 
described in some detail by way of illustration and 
example for purposes of clarity of understanding, it 
10 will be obvious that certain changes and modifications 
may be practiced within the scope of the appended 
claims . 
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CLAIMS 

1. A DNA construct encoding a pr -pr -poly- 
peptide, said DNA construct encoding pre-pro-polypeptid 
comprising a yeast leader sequence, processing signals 

5 for processing the pre-pro-polypeptide to a mature poly- 
peptide and a gene encoding a polypeptide other than the 
wild type gene associated with said leader sequence. 

2. A DNA construct according to Claim 1, 
including at the 5' end of the sequence a yeast pro- 

10 moter and wherein said gene is heterologous to said 
yeast host. 

3. A DNA construct according to Claim 2, 
wherein said yeast promoter is the cr-factor promoter 
and said yeast leader is a leader sequence encoding for 

15 at least a major portion of the a-f actor leader and is 
capable of providing for secretion. 

4. A DNA construct according to Claim 2, 
wherein said gene is a mammalian gene. 

5. A DNA construct comprising a sequence of 
20 the following formula: 

L-( (R) r -(GAXYCX) n -Gene*) y 

wherein: 

L is a leader sequence recognized by yeast 
for secretion; 

25 R is a codon coding for arginine or lysine; • 

r is an integer of from 2 to 4; 
X is any nucleotide; 

y-is-guanosine-or~cytosine; 

y is an integer of from about 1 to 10; 
30 Gene* is a gene foreign to yeast; and 

n is 0 r 1 to 4. 
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6. A DNA construct acc rding to Claim 5, 
wherein n is 0 to 4 and the nucl otid s f said Gene* 
proximal to R at 1 ast in part define a recogniti n 
site for a restriction endonucl as . 

5 7. A DMA construct according to Claim 6, 

wherein said leader sequence is the a-f actor leader 
sequence. 

8. A DNA construct according to Claim 7, 
wherein n is 0. 

10 9. A DNA construct of the formula: 

Tr-L- ( R-S- ( GAXYCX ) n . -W- ( Gene* ) d ) y 

wherein: 

Tr is a sequence having transcriptional and 
translational regulatory signals for initiation and 
15 processing of transcription and translation, wherein 
said regulatory signals are recognized hy yeast; 

L is a leader sequence for secretion by 

yeast; 

R and S are codons expressing arginine and 

20 lysine; 

X is any nucleotide; 

Y is cytosine or guanosine; 

y is an integer of from 1 to 4; 

n' is a whole number of from 0 to 4; 
25 W is a deoxyribosyl-3 ' group or when n 1 is 

other than 0, one or more nucleotides which by themselves 
or together with the hexanucleotide in the parenthesis 
define a restriction site; 

Gene* either by itself or taken together with 
30 W defines a polypeptide sequence foreign to yeast; and 

d is 0 or 1, being 1, when y is greater than 

1. 
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10. A DNA construct according to Claim 9, 
wherein Tr is a s guenc defining the regulatory 
signals for a -factor, d is 1 and Gen * and W ar taken 
together to define a polypeptide foreign to yeast. 

5 11. A DNA construct according to Claim 9 

wherein n« is 0. 

12. A DNA construct according to Claim 11, 
wherein said polypeptide product is a mammalian poly- 
peptide . 

13. A DNA construct comprising a sequence of 
the formula: 

"(Tr ) & -L-R-S- (GAXTOO^uGA ! AGCT | 

wherein: 

Tr is a sequence defining transcriptional and 
translational regulatory signals for initiation and 
processing of transcription and translation recognized 
by yeast; 

a is 0 or 1; 

L is a leader sequence recognized by yeast; 
R and S are codons encoding for lysine and 

arginine; 

X is any nucleotide; 
Y is cytosine or guanosine; 
n" is 2 to 4; 

the nucleotides in the broken box indicate 
the nucleotides which are complementary to the overhang 
of the non-coding chain to define a Hind i 1 1 restriction 
site. 
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14. A DNA construct according to claim 13, 
wher Tr is a 6 quence defining the transcriptional and 
translational r gulatory signals f a- factor. 

15. An expression episomal element compris- 
5 ing a replication system for providing stable mainte- 
nance in yeast and a sequence of the formula: 

Tr-L- ( R ) r , - ( GAXYCX ) Q t -W-Te 

wherein: 

Tr is a sequence defining transcriptional and 
10 translational regulatory signals for initiation and 
processing of transcription and translation in yeast; 

L is a leader sequence recognized by yeast 
for secretion; 

R is a codon defining arginine or lysine; 
15 r 1 is a whole number in the range of 2 to 4; 

X is any nucleotide; 

Y is cytosine or guanosine; 

n 1 is a whole number in the range of 0 to 4; 

V is a nucleotide sequence of at least 1 

20 nucleotide, which by itself or when n* is other than 0, 
in conjunction with nucleotides in the parenthesis 
defines a restriction site; 

Te is a sequence defining a terminator 
balanced with said transcriptional initiator sequenc . 

25 v 16 . An expression episomal element according 

to Claim 15 wherein Tr is derived from a -factor and n' 
is 2 to 3. 

17. An expression episomal element according 
to Claim 14, wherein Tr is derived from o- factor and n 1 
30 is 0. 



18. An episomal expression vector according 
to Claim 17 , having a gen foreign to yeast int mediate 



0116201 

26 

R and Te and in reading frame with the initiati n c don 
of L. 

19. An episomal expression lement according 
to Claim 18, wherein said gene is a mammalian gene. 

.20. An episomal element according to Claim 
19, wherein said mammalian gene is human epidermal 
growth factor. 

21. An episomal expression vector according 
to Claim 16, having a gene foreign to yeast intermediate 
the nucleotides in the parentheses and Te and in 
reading frame with the initiation codon of L. 

22. A method for producing a polypeptide 
foreign to yeast and having such polypeptide secreted 
into the culture medium, said method comprising: 

growing yeast containing an episomal expres- 
sion elements according to Claim 16, whereby the 
encoding sequences are expressed to produce a pre-pro- 
polypeptide; and 

said pre-pro-polypeptide is at least partially 
processed and secreted. 

23. An episomal expression vector according 
to Claim 17,' having a gene foreign to yeast intermediate 
the nucleotides in the parentheses and Te and in 
reading frame with the initiation codon of L. 

24. A method for producing a polypeptide 
foreign to yeast and having such polypeptide secreted 
into the culture medium, said method comprising : 

growing yeast mutants containing an episomal 
expression element according to Claim 16, wherein said 
mutant permits external regulation of expressi n. 
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wh reby th encoding s quenc s are expressed to produce 
a pre-pro-polypeptid und r p rmlssiv conditions; and 

said pre-pro-polyp ptid is at least partially 
proc ssed and s cr t d. 

5 25. A method according to Claim 24, wherein 

said mutant yeast is a temperature-sensitive sir 
mutant. 

26. A method for producing a polypeptide 
foreign to yeast and having such polypeptide secreted 
10 into the culture medium, said method comprising: 

growing yeast mutants containing an episomal 
expression element according to Claim 17, wherein said 
. mutant permits external regulation of expression, 

whereby the encoding sequences are expressed to produce 
15 a pre -pro-polypep tide under permissive conditions; and 

said pre-pro-polypeptide is at least partially 
processed and secreted. 
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27. A method according to Claim 26, wherein 
said mutant yeast is a temperature-sensitive sir 
mutant. 
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1. Hindi! 

2. Sail 



T4 DNA ligase 



hEGF gene 



Linker-1 5-AGCTGAAGCT-3 

3-CTTCGATTGAG-f 
Linker-4 5-TGAGATGATAAG-3 

3-ACTATTCAGCT-5 
BamHI 



EcoRT 



lEcoRI 

2. DNA polymerase 
IKIenow) 

3. BamHI linkers 

4. BamHI 

5. T4 DNA ligase 
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